Using Probabilistic Views for Large-Scale Statistical Inference
نویسنده
چکیده
Probabilistic databases extend statistical inference from limited, hand-crafted statistical models to an entire database. Data analysts can discover trends, test hypothesis, and run what-if scenarios by simply running SQL queries. The technical challenge in a probabilistic database is the query processor, which needs to perform a probabilistic inference for every row output by a SQL query: the general-purpose probabilistic inference algorithms used in this step do not scale beyond small or medium-sized databases. Overcoming this limitation will require major advances in the optimization of probabilistic inference in databases. In this talk, I will describe one line of research in this direction, which relies on a combination of probabilistic views and safe queries. Like a traditional view, a probabilistic view is defined by a SQL query, and like a probabilistic database, its rows are random variables; their probabilities are computed offline, presumably at high expense. "Safe queries" are a restricted class of SQL queries for which the probabilistic inference can be done quite efficiently. The idea in this approach is to rewrite the user query as a safe query over the probabilistic views, thus benefiting from the probabilities that have been computed offline. This talk will give the necessary background on probabilistic databases, and describe some of the technical challenges associated to probabilistic views.
منابع مشابه
Variational probabilistic inference and the QMR - DT databaseTommi
We describe a variational approximation method for eecient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnos...
متن کاملVariational Probabilistic Inference and the Qmr-dt Database
We describe a variational approximation method for eecient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnos...
متن کاملVariational Probabilistic Inference and the QMR - DTNetworkTommi
We describe a variational approximation method for eecient inference in large-scale probabilistic models. Variational methods are deterministic procedures that provide approximations to marginal and conditional probabilities of interest. They provide alternatives to approximate inference methods based on stochastic sampling or search. We describe a variational approach to the problem of diagnos...
متن کاملLPKP: location-based probabilistic key pre-distribution scheme for large-scale wireless sensor networks using graph coloring
Communication security of wireless sensor networks is achieved using cryptographic keys assigned to the nodes. Due to resource constraints in such networks, random key pre-distribution schemes are of high interest. Although in most of these schemes no location information is considered, there are scenarios that location information can be obtained by nodes after their deployment. In this paper,...
متن کاملInferSpark: Statistical Inference at Scale
The Apache Spark stack has enabled fast large-scale data processing. Despite a rich library of statistical models and inference algorithms, it does not give domain users the ability to develop their own models. The emergence of probabilistic programming languages has showed the promise of developing sophisticated probabilistic models in a succinct and programmatic way. These frameworks have the...
متن کامل